Impact of memory hierarchy on program partitioning and scheduling
نویسندگان
چکیده
In this paper we present a method for determining the cache performance of the loop nests in a program. The cache-miss data are produced by simulating the loop nest execution on an architecturally parameterized cache simulator. We show that the cache-miss rates are highly non-linear with respect to the ranges of the loops, and correlate well with the performance of the loop nests on actual target machines. The cache-miss ratio is used to guide program optimizations such as loop interchange and iteration-space blocking. It can also be used to provide an estimate for the runtime of a program. Both applications are important in scheduling programs for parallel execution. Presented here are examples of program optimization for several popular processors, such as the IBM 9076 SPl, the SuperSPARC, and the Intel i860.
منابع مشابه
Low-Power L2 Cache Architecture for Multiprocessor System on Chip Design
Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...
متن کاملInternational Journal of Emerging Trends in Engineering and Development Issue 3, Vol.2 (May 2013) Available online on http://www.rspublication.com/ijeted/ijeted_index.htm ISSN 2249-6149
Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...
متن کاملCarrot-hole Data Scheduling and Adaptive Partitioning for Memory Traac Minimization
Massive uniform nested loops are broadly used in scientiic and multi-dimensional Digital Signal Processing applications. Due to the amount of data handled by such applications, cache or on-chip memory are required to improve the data access and overall system performance. Most of existing application speciic systems do not eeciently optimize the access to diierent levels of memory hierarchy. In...
متن کاملMemory Architectures for NoC-Based Real-Time Mixed Criticality Systems
Mixed criticality systems (MCS) allow software components of differing criticalities to use the same physical resources (ie. CPU, memory). MCS highlight the trade-off between partitioning components of different criticalities and efficient resource usage. Components are partitioned due to safety concerns, but physical partitioning requires more resources than if components are unpartitioned and...
متن کاملA New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
The memory hierarchy in modern computing systems is typically time-shared and space-shared amongst multiple processes and threads, some of which execute simultaneously. Memory contention can signi cantly degrade the performance of running processes. Cache hit counters found in modern microprocessor provide a limited picture as to the memory needs of processes. We propose a low overhead, on-line...
متن کامل